我们提供了公式和开源工具,以使用学识渊博的前动力学和设备计算来实现传感器/执行器系统的内部模型预测控制。微控制器单元(MCUS)在与传感器和执行器共关联时计算预测和控制任务的微控制器单元(MCUS)可以实现内部不受束缚的行为。在这种方法中,小型参数大小神经网络模型离线学习前进运动学。我们的开源编译器NN4MC生成代码以将这些预测卸载到MCUS上。然后,牛顿 - 拉夫森求解器实时计算控件输入。我们首先基准在质量 - 弹簧抑制剂模拟上针对PID控制器的这种非线性控制方法。然后,我们在两个具有不同传感,驱动和计算硬件的实验钻机上研究实验结果:具有嵌入式照明传感器的基于肌腱的平台和带有磁性传感器的基于HASEL的平台。实验结果表明,具有较小的内存足迹(小于或等于闪存的6.4%)的参考路径(大于或等于120 Hz)的有效高带宽跟踪。在基于肌腱的平台中,测得的误差之后路径不超过2mm。在基于HASEL的平台中,模拟路径以下误差不超过1mm。这种方法在ARM Cortex-M4F设备中的平均功耗为45.4 MW。这种控制方法还与Tensorflow Lite模型和等效的在设备代码兼容。内物质智能使一类新的复合材料将自主权注入具有精制人工本体感受的结构和系统。
translated by 谷歌翻译
我们人类正在进入虚拟时代,确实想将动物带到虚拟世界中。然而,计算机生成的(CGI)毛茸茸的动物受到乏味的离线渲染的限制,更不用说交互式运动控制了。在本文中,我们提出了Artemis,这是一种新型的神经建模和渲染管道,用于生成具有外观和运动合成的清晰神经宠物。我们的Artemis可以实现互动运动控制,实时动画和毛茸茸的动物的照片真实渲染。我们的Artemis的核心是神经生成的(NGI)动物引擎,该动物发动机采用了有效的基于OCTREE的动物动画和毛皮渲染的代表。然后,该动画等同于基于显式骨骼翘曲的体素级变形。我们进一步使用快速的OCTREE索引和有效的体积渲染方案来生成外观和密度特征地图。最后,我们提出了一个新颖的阴影网络,以在外观和密度特征图中生成外观和不透明度的高保真细节。对于Artemis中的运动控制模块,我们将最新动物运动捕获方法与最近的神经特征控制方案相结合。我们引入了一种有效的优化方案,以重建由多视图RGB和Vicon相机阵列捕获的真实动物的骨骼运动。我们将所有捕获的运动馈送到神经角色控制方案中,以生成具有运动样式的抽象控制信号。我们将Artemis进一步整合到支持VR耳机的现有引擎中,提供了前所未有的沉浸式体验,用户可以与各种具有生动动作和光真实外观的虚拟动物进行紧密互动。我们可以通过https://haiminluo.github.io/publication/artemis/提供我们的Artemis模型和动态毛茸茸的动物数据集。
translated by 谷歌翻译
我们探索了使用机器学习技术来消除实验光谱中大量$ \ gamma $ ray检测器的响应。分段$ \ gamma $ -Ray总吸收光谱仪(TAS)允许同时测量单个$ \ gamma $ -ray $ -Ray-ray Energy(e $ _ \ gamma $)和总激发能量(E $ _X $)。 TAS检测器数据的分析使E $ _X $和E $ _ \ gamma $数量相关联,因此与使用E $ _x $和E $ _ \ gamma $响应函数相关的技术是复杂的,因此不那么准确。在这项工作中,我们调查了有条件生成的对抗网络(CGAN)同时展开$ e_ {x} $和$ e _ {\ gamma} $ data在TAS检测器中的数据。具体而言,我们采用PIX2PIX CGAN,这是一种基于深度学习进展的生成建模技术,以处理$(e_x,e _ {\ gamma})$矩阵作为图像到图像翻译问题。我们提出了单个 - $ \ gamma $和double-$ \ gamma $ decay cascades的模拟和实验矩阵的结果。我们的模型展示了检测器分辨率限制内的表征功能,其模拟测试用例$ 90 \%$。
translated by 谷歌翻译
图式是通过允许模型将复杂的任务分解为中间步骤来帮助人工智能的复杂任务的结构化表示。我们提出了一种新颖的系统,它引导了来自网络视频的模式,并通过提高视频检索性能的目标来概括他们捕获看不见的任务。我们的系统在三个主要阶段进行:(1)给定有关相关视频的任务,我们使用联合视频文本模型构造一个任务的初始模式,以匹配具有从WikiHow的文本的文本的视频片段; (2)通过利用语言模型来编辑现有模式中的文本来概括模式以解除任务。通过泛化,我们可以允许我们的模式涵盖具有少量学习数据的更广泛的任务; (3)我们将零拍摄的教学视频检索与未经说明的任务名称进行查询。我们的架构引导方法优于现有的视频检索方法,我们证明我们系统引起的模式优于其他模型产生的方法。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译